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Abstract 

The analysis of covariance as a procedure for statistical 
correction of the effects for an extraneous variable, called a 
covariate , is presented. An heuristic data set is used to make 
the discussion of the calculation of ANCOVA partitions easier to 
follow. A discussion of homogeneity of regression as an essential 
condition to be met when conducting ANCOVA is discussed. Data 
reliability and the interpretation of the residualized dependent 
variable as major issues when applying ANCOVA are also discussed. 
It is suggested that caution must be exerted when applying ANCOVA 
to statistically correct for differences due to a covariate. 
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The Use of Analysis of Covariance: User Beware 



Analysis of variance methods and their analog (ANCOVA, 
MANOVA, MANCOVA) remain tools of choice for many researchers 
(Thompson, 1988). Willson (1980) examined the articles published 
in AERJ from 1969 to 1978 and found that ANOVA and ANCOVA were 
used in 41% of the articles. Elmore and Woehlke (1988) studied 
several volumes of educational journals published from 1978 to 
1987 and found that ANOVA/ANCOVA were the most frequently used 
research methods. Though ANOVA methods are frequently used, 
ANCOVA is used much less often and has not been widely used in the 
published behavioral science research (Loftin & Madison, 1991; 
Thompson, 1992 ) . 

Analysis of covariance (ANCOVA) is used primarily as a 
procedure for statistical control of the effects of an extraneous 
variable, called a cgvariate, on the dependent measure (Hinkle, 
Weisman, & Jurs 1988; Keppel & Zedeck, 1989). Cohen (1968) states 
that a covariate is 

after all, nothing but an independent 
variable, which because of the logic dictated by the 
substantive issues of the research, assumes priority 
among the set of independent variables as a basis for 
accounting for Y variance, (p. 439) 
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ANCOVA integrates a regression analysis of the dependent 
variable with the covariate and an ANOVA on the adjusted (residual 
or error) scores on the dependent variable (Huitema, 1980; Loft in 
& Madison, 1991; Wildt & Ahtola, 1978). ANCOVA purports to 
control for the effects of the covariate by partitioning out the 
variance attributed to it (Hinkle, Wiersraa, & Jurs, 1988). By 
statistically controlling for the variance attributed to the 
covariate , the error variance is hopefully reduced* In addition, 
the treatment effects can be clarified and the probability of 
obtaining statistically significant results will be increased 
(Loftin & Madison, 1991), if the assumptions required by ANCOVA 
are met. 

However, this is of limited importance in and of itself. 
Thompson (1988) indicates that statistical significance "is not 
the end-all and be-all of research" (p. 100). Statistical 
significance is largely an artifact of sample size. Since the 
null hypothesis of no difference is almost always false, with a 
large enough sample the null hypothesis will always be rejected, 
indicating statistical significant results. In addition, 
statistical significance does not provide information about result 
importance or generalizability (Carver, 1978). 

Figure 1 illustrates the partitioning of variance in ANCOVA. 
The area inside the circle represents the total variance of the 
dependent variable. The proportion of variance attributed to the 
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treatment effects along with the variance attributed to the 
covariate are shown. 

INSERT FIGURE 1 ABOUT HERE 



Huitema (1980) indicates that ANCOVA has two advantages , when 
it is correctly applied. ANCOVA provides greater power against 
Type II error and reduces the bias caused by differences that 
exist among groups before experimental treatments are considered. 
However, due to the erroneous belief that ANCOVA will always 
provide "control" and "power" , the method is used by researchers 
even in cases in which ANCOVA is not appropriate (Thompson, 1988). 

Various conditions should be met in order to perform ANCOVA 
correctly (Elashoff, 1969; Bump, 1991). ANCOVA assumes a high 
correlation between the covariate and the dependent variable. If 
this condition is not met, the covariate will do little to reduce 
the error sum of squares, which is the primary objective of ANCOVA 
(Loftin & Madison, 1991). 

ANCOVA also requires that the covariate must be unaffected by 
the independent variable(s) and thus, explains different portions 
of the total variance of the dependent variable. Also in ANCOVA, 
the residualized dependent variable is assumed to be normally 
distributed for each level of the independent variable, and the 
variances of the residualized dependent variable for each level of 
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the dependent variable are assumed to be equal • Next ANCOVA 
assumes a linear relationship between the covariate and the 
dependent variable. A linear relationship implies that a change 
in magnitude on the covariate is presumed to cause a proportional 
change on the dependent variable at each level of the covariate. 
Finally , ANCOVA requires that the regression slopes between the 
covariate and the dependent variable must be parallel for each 
independent variable group. This is known as homogeneity of 
regression . 

The purpose of the present paper is to explain the 
computational processes of ANCOVA partitions. A discussion of 
homogeneity of regression as an essential condition to be met when 
conducting ANCOVA is presented. Data reliability and the 
interpretation of the res idualized dependent variable as major 
issues when applying ANCOVA are also discussed. 
Computing the adjusted sum of squares for ANCOVA 

Like ANOVA, ANCOVA is often used to test whether group means 
differ. However , in the ANCOVA case, the means have been adjusted 
for differences between the groups on the covariate(s) (Huck, 
Cormier, & Bounds, 1974). 

As previously discussed, ANCOVA combines regression analysis 
and analysis of variance (ANOVA) when adjusting sum of squares for 
the variance attributed to the covariate(s) (Wildt & Ahtola, 
1978). Table 1 presents a general summary table for ANCOVA. 
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INSERT TABLE 1 ABOUT HERE 



The calculation <5f ANCOVA sum of squares involves several 
steps (Hinkle, Wiersma, & Jurs, 1988) • Table 2 presents an 
heuristic data set used to make the calculation of ANCOVA 
partitions easy to follow. 

INSERT TABLE 2 ABOUT HERE 

The adjusted total sum of squares (SOS?/) is defined as the 

total sum of squares after removing the variance attributed to the 
covariate ( s ) . Thus , 

SOS T , = S0S T (1 - r2 T ) 

where SOS T is the total sum of squares from the ANOVA on the 

dependent variable and r T is the correlation between all scores on 

the dependent variable and the covariate. These calculations are 
presented in Table 3. 

INSERT TABLE 3 ABOUT HERE 

The SAS commands to perform ANOVA/ANCOVA are presented in 
Appendix A. Table 4 presents an ANOVA summary table for the data 
set, as a basis for comparison with the results for the same data 
once a covariance correction is involved. 
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INSERT TABLE 4 ABOUT HERE 



The adjusted within group sum of squares is defined as, 

SOSvr = S0S W (1 - r2 w ) 
where SOS* is the within group sum of squares from the 
ANOVA on the dependent variable and r w is the pooled correlation 

coefficient between the scores on the dependent variable(s) and 
the covariates ( s ) • 

The calculation of the adjusted between group sum of squares 
involves several somewhat complicated steps. For the purpose of 
this paper, the SOS^ will be calculated as, 

SOS B ' = SOS T ' - SOS*?* 

where SOS T / is the adjusted total sum of squares and SOSw is the 

adjusted within sum of squares. 

The sum of squares for the covariate is computed by 
subtracting the adjusted sum of squares total (SOS T ') from the 

total sum of squares (SOS?), 

SOScov = SOS T - SOS T ' 

Table 5 presents a summary table for the ANCOVA partitions for the 
data set. 

INSERT TABLE 5 ABOUT HERE 
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Computing the ANCOVA test statistic and testing significance 

The adjusted sum of squares partitions can be used to test 
the statistical significance of difference in group means after 
adjustment for the covariate(s) is made (Hinkle, Wiersnia, & Jurs, 
1988). The test statistic for ANCOVA (F) is the ratio of the 
adjusted between group mean square (MS B ') and the adjusted within 

group mean square (MS W ' ) , 

F = MS B '/MS W ' 

To test significance , the obtained value of F (i.e., 
"observed F", "F calculated", or "F ratio") is compared to the 
critical value of F r obtained form a table of critical values 
found in most statistics books. The critical value of F is 
obtained using the adjusted between group sum of squares (SOSb') 

degrees of freedom for the numerator and the adjusted within group 
sum of squares (SOS w ') degrees of freedom for the denominator at an 

alpha level predetermined by the researcher. 

When the observed value of F exceeds the critical value of F, 
the null hypothesis of no difference among the adjusted means can 
be rejected, implying that the result is statistically 
significant. If the observed value of F is less that the critical 
value of F, then the null hypothesis is not rejected. 

Obtaining statistically significant results is not indicative 
of results importance, replicability, or generalizability (Carver, 
1978). It only indicates that a decision to reject or not reject 
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the null hypothesis has been taken (Haase & Thompson, 1992; 
Thompson, 1993). The only conclusion that can be drawn at this 
point is that at least one pair or combination of adjusted means 
differs* To determine which pair or combination of pairs differ, 
a post hoc analysis must be conducted (Hinkle, Wiersma, & Jurs, 
1988). However, Thompson (1988) discusses the limitations of post 
hoc comparisons and the advantages of a priori or planned 
comparisons, and can also be used in ANCOVA. A priori or planned 
comparisons provide more power against Type II error and force the 
researcher to be more thoughtful in conducting research* 

Keppel and Zedeck (1989) discuss a multiple regression 
approach to ANCOVA involving a hierarchical strategy "in which the 
covariate is the first variable (or set of variables) to be 
entered into the analysis and the vectors representing the 
independent variable are entered next" (p. 457). For a thorough 
discussion of this approach see Keppel and Zedeck (1989). 
The homogeneity of regression assumption 

As previously discussed, various conditions should be met to 
perform ANCOVA correctly. ANCOVA requires that the slope of the 
regression line is the same for all treatment groups. This is 
what is so called homogeneity of regression assumption (Wildt & 
Ahtola, 1978). Loftin and Madison (1991) argue that "this is 
exactly where most applications of ANCOVA fail, since researchers 
quite often have truly non equivalent K groups for which the 



10 



ERLC 



II 



Analysis cf Covariance 

regression slopes indeed are different". 

What ANCOVA does f according to Thompson (1992) f is to create 
a single pooled regression equation ignoring group assignment, to 
calculate the adjustment in the dependent variable using the 
covariate(s) . This pooled equation is created by assuming that 
the equations are the same across groups and that an "average" 
equation can be used for all subjects ignoring group membership 
(Bump f 1992). Then an ANOVA f not ignoring groups, of the 
deviation of the residualized scores from the regression line is 
performed (Cliff f 1987). 

The homogeneity of regression is legitimate if and only if 
the regression equation of the groups have parallel slopes (Huck, 
Cormier, & Bounds, 1974; Thompson, 1992). This assumption 
requires that the "b" weights applied to the covariate(s) be 
reasonably equal across each groups. That is, any adjustment in 
the covariate(s) will result in the same proportionate 
adjustment in the dependent variable for each K level of the 
independent variable. 

Two other conditions ought to be met in order to perform 
ANCOVA correctly. One deals with the reliability of the 
covariate ( s ) . As Thompson (1992) points out 

researchers often incorrectly presume that the 
characteristic of reliability inures to tests, when in 
fact reliability is a characteristic of a given set of 
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data collected using a given time from a given set of 

subjects using a given protocol (p.xii). 
Due to this erroneous belief , many researchers do not check and do 
not report the reliability of their data. Loftin and Madison 
(1991) argue that the covariate(s) used must be especially 
reliable , "or one will end up potentially adjusting sampling error 
with measurement error, and creating a mess" (p. 145) • 

The other condition involves the interpretation of the 
residualized dependent variable. As discussed, ANCOVA is used to 
correct for the effects of a covariate(s) on the dependent 
variable (Huck, Cormier , & Bounds, 1974). Then the residual is 
analyzed, thus partitioning the effects of the treatment. 
However, as Thompson (1992) states, th^ use of statistical 
correction may be dangerous especially when using multiple 
covariates . It may result in the analysis of a dependent variable 
that no longer makes sense. 
Summary 

The analysis of covariance is used to statistically correct 
for the effect of an extraneous variable. The purpose is to 
adjust for initial group differences before the treatment is 
applied. However, several conditions should be met when applying 
ANCOVA. Due to the erroneous belief that ANCOVA will always 
provide "control" and "power", the method is applied even when is 
not appropriate. Caution should be exerted when applying ANCOVA 
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as a method for statistical correction , especially when using 
multiple covariates; the researcher may end with a dependent 
variable that does not make any sense* The reliability of the 
data and the interpretation of the residual are issues of concern 
when applying statistical correction methods* 
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Table 1. 

General summary table for ANCOVA 



Source SOS df MS F Fcv 

Covariate 1 
Between k-1 

Within n-k-1 

Total n^l 

SOScovariate = SOS total times r2 between 
the covariate and the dependent variable 

k= number of groups 

n = sample size 

MScov = SOScov/dfcov 

MSB' = SOSB^/dfB" 

MSB' = SOSwVdfvT 

F = MSb'/MSvT 
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Table 2. 

Data set and analysis for ANCOVA example 



Group 


Achievement 


IQ 




Score (Y) 


Covariate 


1 


60 


100 


1 


63 


102 


1 


66 


104 


1 


69 


103 


2 


62 


104 


2 


63 


109 


2 


67 


104 


2 


71 


117 


3 


65 


102 


3 


68 


117 


3 


72 


108 


3 


76 


105 



Group 


n 


Mean 
(Y) 


Std Dev 
(Y) 


Mean 
(Cov) 


Std Dev 
(Cov) 


1 


4 


64.50 


3.87 


102.25 


1.71 


2 


4 


65.75 


4.11 


108.50 


6.14 


3 


4 


70.25 


4.79 


108.00 


6.48 
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Table 3. 

Calculation of ANCOVA partitions 



SOSt' = SOST (1 - r2T) 

- 237.67 (1-.41(.41)) 
= 237.67 (.8341) 
= 198.24 



SOSw' = SOSW (1 - r2W *) 

- 164.50 (1 -.32(.32)) 
= 164.50 (1- .1037) 

- 164.50 (.8963) 
= 147.45 

*see Appendix B for rw calculation 



SOSb ' = SOSt' - SOSW 

= 198.24 - 147.5 
= 50.79 



SOScov = SOST - SOST' 

= 237.67 - 198.24 
= 39.42 
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Table 4. 

ANOVA summary for data set 



Source 


SOS 


df 


MS 


F 


Fcv 


Between 


73.17 


2 


36.58 


2.00 


4.26 


Within 


164.50 


9 


18.27 






Total 


237.67 


11 
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Table 5. 

ANCOVA summary table for data set 



Source 


SOS 


df 


MS 


F FCV 


Covariate 


39.42 


1 


39.42 




Between 


50.79 


2 


25.40 


1.38 4.46 


Within 


147.45 


8 


18.47 




Total 


237.67 


11 
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Figure 1 

Partitioning of variance in ANCOVA 




\ ! ote: Tiiis diagram presumes that the covariate docs not overlap 
at all the variance due to treatment 
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Appendix A 
SAS program listings for ANCOVA job 

TITLE 'ANCOVA'; 

DATA Dl; INFILE ABC; 

INPUT GROUP 1 DEPVAR 3-4 COV 6-8; 

PROC PRINT; 

TITLE 'DATA PRINT OUT'; 
PROC CORR; 
PROC SORT; BY GROUP; 
PROC MEANS; BY GROUP; 
PROC GLM; 
CLASSES GROUP; 

MODEL DEPVAR COV = GROUP; 
MEANS GROUP /TUKEY; 
PROC GLM; 
CLASSES GROUP; 

MODEL DEPVAR = COV GROUP GROUP*COV; 
PROC GLM; 
CLASSES GROUP; 

MODEL DEPVAR = COV GROUP; 
LSMEANS GROUP /PDIFF; 
MEANS GROUP; 



Source: Hinkle, Wiersma, & Jurs (1988) 
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APPENDIX B 
Calculation of r. 



Group 
1 




X 


Y 


XY 


X z 


Y z 






100 


60 


6000 


10000 


3600 






102 


63 


6426 


10404 


3969 






104 


66 


6864 


10816 


4356 






103 


69 


7107 


10609 


4761 


Sum 




409 


258 


26397 


41829 


16686 


2 




104 


62 


6448 


10816 


3844 






109 


63 


6867 


11881 


3969 






104 


67 


6968 


10816 


4489 






117 


71 


8307 


13689 


5041 


Sum 




434 


263 


28590 


47202 


17343 








o ~> 


o o o u 


_L U H U *x 


A 0 9 R 

** £. C, -J 






117 


68 


7956 


13689 


4624 






108 


72 


7776 


11664 


5184 






105 


76 


7980 


11025 


5776 


Sum 




432 


281 


30342 


46782 


19809 






( n K EXY K 


- SX K SY K 


) 




















r w = 
















j 


{S [ n K EX K 2 - (EX K 


) ]} {2 


[ n K 2Y K 


- (2Yj 




E 


( n K SXY K 


- SX K SY K 


) 










n l 


SXY X 


2X2 










4 


26397 


409 


258 








4 


26397 




105522 










105588 




105522 














66 








n 2 


SXY 2 


sx 2 


EY 2 








4 


28590 


A O A 

434 


263 








4 


28590 




114142 










114360 




114142 














218 








n 3 


EXY 3 


sx 3 


EY 3 








4 


30342 


432 


281 








4 


30342 




121392 










121368 




121392 














-24 






2 


= 66 + 


218 + - 


24 « 260 
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ZX^ - 


(EX-. ) 2 




EY-j 2 - 


(2Y-, ) 2 


4 


41829 


409 


4 


16686 


258 


4 


41829 


167281 


A 

4 


16686 


66564 




167316 


167281 




66744 


66564 






35 






180 


Tin 

2 


zx 2 2 


(SXn) 2 


2 


ZY 2 2 - 


(ZY 9 ) 2 


4 


47202 


434 


4 


17343 


263 


4 


47202 


188356 


4 


17343 


69169 




188808 


188356 




69372 


69169 






452 






203 


n 3 


zx 3 2 - 


(ZX 3 ) 2 


n 3 


ZY 3 2 - 


(SY 3 ) 2 


4 


46782 


432 


4 


19809 


281 


4 


46782 


186624 


4 


19809 


78961 




187128 


186624 




79236 


78961 






504 






275 




35 + 452 


+ 504 = 991 


S = 


180 + 203 


+ 275 = 



991 X 658 = 652078 
652078* 5 = 807.5134 



E ( n K EXY K - EX K EY K ) 

\ {S [ n K EX K 2 - (EX K ) 2 ]} {S [ n K 2Y K 2 - (SY K ) 2 ]} 
= 260 / 807 • 5134 = 0. 321976 
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